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A general notion of information-related complexity applicable to both natural and man- 
made systems is proposed. The overall approach is to explicitly consider a rational agent 
performing a certain task with a quantifiable degree of success. The complexity is defined 
as the minimum (quasi-)quantity of information that's necessary to complete the task to 
the given extent - measured by the corresponding loss. The complexity so defined is shown 
to generalize the existing notion of statistical complexity when the system in question can 
be described by a discrete-time stochastic process. The proposed definition also applies, in 
particular, to optimization and decision making problems under uncertainty in which case it 
gives the agent a useful measure of the problem's "susceptibility" to additional information 
and allows for an estimation of the potential value of the latter. 



E-mail: eup2Qlehig h.edu| 
E-mail: dpg3@lehigh.edu 



2 



I. INTRODUCTION 



The general field of complex systems science is relatively new and has been developing rather 
rapidly in the last three decades. One of the central notions it revolves around is - understandably 
- that of complexity. The desire to define complexity before attempting to study it is indeed 
rather natural and logical in this context. When the field of complex system science began taking 
its shape several notions of complexity were already in use. The most notable of them was the 
Kolmogorov descriptional complexity defined shortest length of a computer program 

needed to reproduce the given object (in the simplest case represented by a string of symbols 
from a given alphabet). It was quickly noted, however, that the Kolmogorov complexity is rather 
a measure of randomness as opposed to complexity understood as the amount of appropriately 
defined "internal structure" possessed by the object [jj. It was also realized that a proper measure 
of complexity should be probabilistic since complexity as it is encountered in natural sciences and 
engineering is inherently linked to some sort of uncertainty. Following these realizations, several 
useful measures of complexity were developed, including set complexity, true measure complexity, 
effective measure complexity [4J], logical depth [j], f|, statistical complexity Q], sophistication [§]. 
Among these measures, statistical complexity was the measure foundations and applications of 
which were later developed in most detail (see [{J]). For this reason, we will use it later as a 
"benchmark" complexity measure for systems that can be described by discrete-time stochastic 
processes. 

The main goal of the present work is to put the problem of complexity measures in an "en- 
gineering" problem-oriented context and, in particular, to demonstrate that a generalization of 
statistical complexity measure arises naturally when the proposed "engineering" complexity no- 
tion is applied to the case of a prediction task for discrete-time stochastic processes. The central 
figure of the proposed approach is an agent whose goal is to perform a certain task. Typical exam- 
ples of tasks are control, decision making, optimization, prediction. We will also refer to the such 
tasks as problems. From a general practical point of view, a task can be made complex by several 
of its aspects. It can be, for example, computationally complex (like a large scale deterministic 
optimization problem). It can be operationally complex (like that of building a complicated object 
given full instructions). We will not deal with such complexity types. Instead, we concentrate on 
information-related complexity, the measure of which is an appropriately defined quantity of some 
relevant information needed for completing the task with a given (also appropriately measured) 
degree of success. This implies that the task is initially associated with some degree of uncertainty 
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that can be fully or partially removed by acquiring the appropriate information. 

The rest of the paper is organized as follows. In Section HU the general definition of task 
complexity is given. Section IIIII applies considers the optimization under uncertainty (stochastic 
optimization) as a task and derives the quasi-information complexity for optimization. Section IIVI 
considers the task of prediction of stochastic processes output and makes a connection to statistical 
complexity introduced in [7] which turns out to be just the task complexity taken at zero value of 
the argument. Finally, Section IVl summarizes the contributions of the present work and presents a 
brief informal discussion. 

II. GENERAL FORMULATION: INFORMATION-RELATED TASK COMPLEXITY 

An agent is trying to accomplish a certain task and can, in principle, be helped by appropriate 
additional information. The quality (or degree) of the task accomplishment is assumed to be 
quantifiable by a real number and thus can be compared between any two instances of the task 
accomplishment. Respectively, a notion of real-valued loss can be defined as (the absolute value 
of) the difference between the degree of the task accomplishment in the given instance and the 
best possible one. The latter has to be defined appropriately given the specific task at hand as 
examples given later in the paper illustrate. 

Besides loss, any information 3 that the agent can acquire is also assumed to possess some real- 
valued measure 1(3) of its effective quantity which we will refer to as quasi- quantity to distinguish 
it from measures of "pure" quantity such as Shannon entropy. The appropriate form of quasi- 
quantity measure 1(3) has to be determined by considering the specific nature of the task in 
hand to appropriately reflect its essential features. In particular, the quasi-quantity measure may 
coincide with a traditional "pure" quantity measure for some tasks. The agent is also assumed 
to be capable of making optimal use of the acquired information for the purpose of minimizing 
the loss. Given the latter assumption, we can introduce the notion of the loss L(3) of the specific 
information 3 available to the agent. 

It is natural to assume that, if the measure of information quantity adapted in the given context 
is adequate, then smaller loss values can in general be obtained for larger values of the information 
quasi-quantity. Thus, to reduce the loss more, larger quantities of information would in general be 
needed. On the other hand, it is clear that, in general, different "pieces" of information of equal 
quasi-quantity would lead to different values of loss reduction. Therefore, if loss reduction at a 
minimum information "cost" is the agent's goal, it makes sense to search for the specific information 
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that would allow the agent to achieve that goal. It is also rather natural to use the initial value 
of the loss Lq = L(0) that the agent can obtain in the absence of any (additional) information as 
a reference point for the reduced loss. Specifically, one can use the ratio 7 = as a natural 
dimensionless measure of the achieved loss reduction. The lowest information quasi-quantity needed 
to achieve such a loss reduction can be associated with the proper quasi-information complexity 
C 7 of the task in question. This leads us to the following definition. 

Ov= min 1(3). (1) 

{3:L(3)<jL } 

Note that the minimization in ([1]) is performed over all possible "kinds" , or "pieces" of information 
that - if used optimally by the agent - would yield the value of loss not exceeding the given fraction 
7 of the original value Lq. Note also that the specific interpretations of the loss L(-) and quantity 
/(•) are intentionally left open by the definition ([1]). The understanding is that they have to be 
made more concrete by taking the "nature" and the details of the system in question into account. 
In the next two sections, we consider two more specific examples that illustrate the use of the 
proposed definition. 



III. OPTIMIZATION 



Consider a typical "engineering" case of optimization under uncertainty, with the expected value 
of the objective function being the main optimization criterion (stochastic optimization formula- 
tion). The overall task can be concisely written as follows. 

min^gx / f(ui,x)P(dui), (2) 
Jn 

where X C D is the set of all feasible solutions, i.e. the set satisfying all (deterministic) con- 
straints that may be present in the problem formulation (with T> being the space to which all 
solutions belong like a suitable Euclidean space); is the base space of possible values of input 
data parameters that are not known with certainty often referred to as a parameter space; P is a 
probability measure on (f2, 3") (where 3" is a suitable sigma-algebra) that describes the initial state 
of information available to the agent; /: £1 x D — ?■ 1R is a function integrable on Q for each x € X 
describing the specific optimization objective. 

The value of the initial loss Lq then becomes simply 

L = L(P) = [ (f(u, x* P ) - f(u>, xl))P(du) (3) 
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here X p IS Si solution of (|2|) and X ^ IS Si solution of mm x& xf(w, x) for the given to of the parameter 
space Q. 



As it has been argued in [l(J and Jj|, in the general context of problem ([2]), additional in- 
formation can be obtained by the agent by means of querying information sources. This can be 
accomplished by the agent via formulating and asking questions and receiving corresponding an- 
swers from the source(s). It has been further argued that the appropriate meaning of information 
quasi-quantity is supplied by pseudoenergy that provides real- valued measure for both question dif- 
ficulty and answer depth functionals. The reason pseudoenergy is appropriate in the given context 
is that it takes into account the source's knowledge structure and therefore allows the agent to 
correctly select optimal "pieces" of information to maximize their effect on the given problem. 
The role of "pieces" of information themselves is played by questions which can be associated, 



following the earlier developments of 
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14| and la, LL6] , with partitions of the problem parameter 



space ft. The corresponding quasi-quantity is then given by the question difficulty functional 
G(Q, C, P) where the partition C = {C\, . . . , C r } is a collection of subsets of £1 such that CidCj = 
for i ^ j and U r i=1 Ci = U. The loss L(C) corresponding to question C can be interpreted as the 
minimum loss that an optimizing agent can obtain upon reception of a perfectly accurate answer 
to question C can be easily seen to take the form 



L(C,P) = £p(C,-) I (/( 

3=1 J °i 



u,x* Pi )-f(u,,x*J)dPcAu), (4) 



where Pc 4 is the conditional measure corresponding to the subset Cj G C. It is straightforward to 
show that L(C) < Lq. Correspondingly, the informational complexity C 7 of the optimization task 
([2]) takes the form 

Cv= min G(Q,C,P), (5) 

7 {C:L(C,P)< 7 L(P)} 

where minimization in ([5]) is over all possible partitions of fi. The form of informational complexity 
([5]) assumes a specific source knowledge structure which is encoded in the form of question difficulty 
functional G(-). In particular, as was shown in [10], in an isotropic (but in general non-uniform) 
case, the functional G(-) depends besides the "initial" probability measure P, on an integrable 
function u: O, — > M + on the parameter space £1: 



G(0,C,P) 



f c u(u>) dP(u>) 

where u(Cj) = — 3 p(c ) • ^ the knowledge structure is unknown, it appears reasonable to 

assume the most symmetric case (uniform and isotropic) in order to obtain a "benchmark" (or 
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TABLE I. Quasi-information complexity C 7 for some values of 7 for Example 1. 
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"universal") task complexity. In this case, the question difficulty reduces to Shannon entropy 
H(P(C)) = H(P(C\), . . . , P(C r )) of the discrete probability distribution induced by the partition 
C, and the informational task complexity C 7 becomes simply 

C 7 = min H(P(C)). (6) 

' {C:L(C,P)< 7 L(P)} 



A. Example 

Consider the stochastic optimization problem of the form 

maximize x\ + L0X2 
subject to x\ + x\ < 1 

x\ > 0, X2 > 

where the only uncertain parameter is oj which has a uniform distribution on the interval £1 = 
[0,1]. The situation is depicted in Fig. [1] and is referred to as Example 1 in what follows. It is 
straightforward to show that for a partition C of £1 into r subsets of equal measure, the loss (j3|) 
takes the form 

L(c,p)=|vrT^-ig^+(^i) 2 . 

On the other hand, if one uses (in the absence of information about possible information sources 
and their knowledge structure) the uniform isotropic model, the difficulty functional reduces to 
Shannon entropy and G(f2,C,-P) = H(P(C)) = logr. The relationship between the information 
quasi-quantity H(P(C) and the corresponding loss (rescaled so that Lq = 0) is shown in Fig. [21 
The task complexity C 7 for some values of the parameter 7 are shown in Table [H In particular, 
the "full" task complexity Co for this example is infinite. 

Suppose now that the parameter space O and the initial measure P are the same but the task 
is somewhat different. Specifically, the stochastic optimization problem has the linear form (see 
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Range of deterministic 
optimal solutions 



X 1 



FIG. 1. Example 1. 




FIG. 2. Loss vs. information quasi-quantity for example 1. 



Fig. [3] for an illustration). We refer to this example as Example 2. 



maximize x\ + UX2 
subject to x\ + (V2 — l)x2 < 1 
(V2-l)x 1 +x 2 < 1 

X\ > 0, X2 > 

In this case, as it is easy to see, it is sufficient to consider questions corresponding to partitions of O 
into two subsets only, in order to fully get rid of the agent uncertainty with respect to the optimal 
solution - assuming that perfectly accurate answers to such questions are available. Specifically, 
consider partitions of f2 of the form C = {[0, a], (a, 1]} where < a < u c = y2 — 1. Then the loss 
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TABLE II. Quasi-information complexity C 7 for some values of 7 for Example 2. 
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FIG. 3. Example 2. 



(SI) takes the form 



L(C,P) = ^=^K-a) 



2y/2 



and the information quasi-quantity becomes 

G(Q, C, P) = H{P{C)) = H(a) = -a log a - (1 - a) log(l - a). 

The relationship between them is shown in Fig. [U The task complexities for some values of 7 are 
shown in Table HH One can see that the "full" complexity Co for this example is finite. 
Let us now make some observations (and careful generalizations). 

• Even if the "uncertainty structure" (the parameter space O and the measure P) is precisely 
the same, the task complexities may be different (and even very different) if the agent is 
solving different problems. In the two examples just discussed, for instance, the complexity 
Co is infinite for Example 1 while it is and finite (and rather small) for Example 2. 

• The "full" complexity Co typically does not tell the whole story about the "information 
content" of a problem. 

• The whole curve C*y is in principle needed to obtain a picture of the problem's "information 
susceptibility" , or, equivalently, the ability of additional information to improve the solution 
quality. 
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FIG. 4. Loss vs. information quasi-quantity for example 2. 
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FIG. 5. Generation of observed symbols. 



The knowledge of the curve C 7 , coupled with that of available additional information sources 



and their knowledge structure (see [lol. Illl] for a discussion of the latter), allows the agent 
to evaluate the effectiveness of additional information for the particular problem and its 
corresponding value. 



IV. STOCHASTIC PROCESSES AND STATISTICAL COMPLEXITY 



Consider now an application from a different domain: the case of discrete-time stochastic pro- 
cesses. To make the discussion more concrete, let us consider a specific simple example (referred 
to as Example A in what follows) which is borrowed from l?j]. Suppose the unobserved symbols 
A and B are generated randomly with equal probability. Symbols and 1 - that are observed 
- correspond to pairs of A and B as follows: {AA, AB, BB} — > 1, {BA} — > (see Fig. [5] for an 
illustration). For example, if AABAB was generated, the observer will see 1101. 

The concept of e-machines was introduced in [?| and fully developed in {]] to describe the 
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FIG. 6. Exact e-machine for Example A. 



intuitive (and practical) notion of complexity as the amount of underlying "structure" in the 
system, e-machines consist of casual states and transitions between them (see for a detailed 
exposition). Each time a transition happens, an observed symbols is output. They can be thought 
of as functions on Markov chains. Causal states are equivalence classes of previous histories: given 
two histories belonging to the same causal state S the conditional distribution of future string of 
bits is the same. Thus e-machines are deterministic meaning that given a causal state and an output 
symbol, the next causal state is uniquely determined, e-machines can in principle be reconstructed 
from binary tree representation of the system output. Considering longer symbol sequences (words) 
allows for distinguishing more distinct causal states (which in turn requires longer output sample 
sizes). It can be shown that, for our example, the full e-machine has the form shown in Fig. [H 
The transition probability for this e-machine have the form Pr(l AiB — > lA(i + 1)B) = 2 (i+i) ' 
Pi(lAiB — > 1A) = 2 (i+i) i an d Pr(1^4 — y IA\B) = 1. Any transition to the state 1A outputs a 
symbol 0; all other transitions output 1. 

Given an e-machine M for the system, the statistical complexity is defined as the minimum 
expected amount of information that needs to be kept track of for optimal prediction of the next 
bit (which in general is imperfect): 

C M (M) = -^P(5)logP(5), (7) 

where P(S) is the (steady state) probability of the causal state S. The topological complexity is a 
related quantity that reflects just the number of causal states in the e-machine: 

C r = log|M|. 

For our example, it is easy to see that P{lAiB) = i = 1,2... and P(1A) = 1/4. Thus 

C M (M) = 2.71. Note that the topological complexity C T = log |M| of this system is infinite. 
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The entropy rate for an e-machine (and the corresponding dynamic system) can be found as 

hfi(M) =-J2 P(S)J2 p M(s\S)\ogP M (s\S), (8) 

SeM seA 

where A is the alphabet of output symbols (A = {0, 1} in this example). The entropy rate can 
be thought of as the measure of uncertainty per symbol that cannot be removed by finding the 
inherent structure of the system. For our example, the entropy rate is found to be equal to 0.678. 
We will also need the notion of relative entropy rate ^„(M|M') of e-machine M with respect to 



another e-machine 



181 ] M'. Let g: M' — > M be a map between states of these two e-machines. 



Then the relative entropy rate /^(MIM') is defined as 

^(M|M') = - Yl PM(S')Y,PM'(s\S')logP M (s\g(S')). 

S'eM seA 

It is straightforward to show that 

^(M|M') = V(M') + Pw(S')KL(P w (S')\\P M (S')), (9) 

S'eM' 

where KL(P M ,{S')\\P M (S')) = J2seA p M'(s\S') log p^ff)) is the corresponding Kullback-Liebler 
distance. 

To make a link between task complexity and statistical complexity, we need to decide on the 
suitable task. It appears reasonable, since in this case we want to study the complexity of our 
system "in its own right", the logical choice for the task is that of suitably defined prediction. We 
also need to define the corresponding loss in a logical way. Several choices are in principle possible 
here, with quadratic and logarithmic loss being the two popular ones in prediction algorithms 
literature. We will use the latter of the two. Specifically, we will use the following single bit loss 
function 



l(s,a) 



- log <r, if s = 1 

-log(l-cr), ifs = 

where < a < 1 is the prediction. It's easy to see that if p = Pr(s = 1) then 

minE[l(s,o-)]=H(p) (10) 

a 

with the minimum being achieved for a = p. Furthermore, one can show that 



E[l(s,a)] = H(p) + KL(p\\a), 



(11) 
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where KL(p\\a) = plog £ + (1 — p) log y5f is the Kullback-Liebler distance between distributions 
(p, 1 — p) and (er, 1 — a). 

Comparing (|10p and (JH]) we can see that the entropy rate for the given e-machine represents 
the minimum possible long-term (i.e. expected) logarithmic prediction loss. In other words, this 
is the minimum prediction loss that can be obtained by an optimally acting agent in possession of 
the complete information about the system. It is reasonable to identify the information J from the 
general definition ([1]) with some e-machine describing the system's output. Depending on the 
particular e-machine available to him, the agent may possess complete or partial information. Let 
us denote the "full" e-machine corresponding to complete information by M*. Since the logarithmic 
loss cannot be reduced below that obtained from M*, it makes sense to set 

L(M*) = (12) 

and interpret L(M) for any other e-machine as the expected logarithmic loss in excess of /i„(M*). 
One can easily show that the expected logarithmic prediction loss for a system functioning according 
to the "full" e-machine M* but predicted optimally with the help of another e-machine M is equal 
to the relative entropy rate ^„(M|M*). Using the adapted loss definition (|12p . we have to subtract 
the minimum (non-removable) logarithmic expected loss /^(M*) to arrive at the loss corresponding 
to M. Taking into account the relationship Q, we arrive at 

L(M) = /i M (M|M* ) - h„(M*) = Pm*(S)KL(P m *(S)\\P m (S)). (13) 

seM* 

The information quasi-quantity corresponding to the e-machine M can be defined, in the absence 
of any information about possible sources and their knowledge structure, with the help of the most 
symmetric model for the latter - uniform and isotropic. This leads, as was mentioned earlier, to the 
Shannon entropy for the steady state distribution of the corresponding e-machine which is nothing 
else but the statistical complexity of M: 

J(M) =-J2 p (S) log P(S) = C^M). (14) 

SeM 

In particular, if the information available to the agent is complete, i.e. M = M*, then the infor- 
mation quasi-quantity (|14p coincides with the system statistical complexity (J7J) (and the loss the 
agent can achieve vanishes). 

Let us now find (or estimate) the quasi-information complexity for the prediction task for the 
original example. The general definition ([TJ applied for this case reads 

C 1 = min I(M), (15) 

' {M:L(M)< 7 L } 
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since the role of information 3 is played by various e-machines M. The original loss Lq here 
should be identified with the expected logarithmic prediction loss that can be obtained without 
any knowledge of the underlying statistical mechanism, i.e. using a partial e-machine with a single 
state. The minimum in ([15p has to be taken over all possible e-machines DVC (and corresponding 
maps g: M* — > M). Since the full e-machine M* consists of causal states, it is sufficient to 
consider partial e-machines with states corresponding to subsets of causal states of M*. Specifically, 
let k = {Si, . . . ,S m } be a transition- consistent partition of the set S of states of the e-machine 
M*, i.e. such a partition that symbols output by any transition between two states of M* are 
same for all states in the same subset in the partition. An e-machine M is obtained from any 
transition-consistent partition k of the set of states of M* by associating a state Sj of M with the 
subset Sj of k for j = 1, . . . , to. The transition probability P(Sj — > S^) for M is then calculated 
as Pr(5 :) ' — > Sk) = Y^S'e§ 3 P(S') P r 0S" — > S"), where S" is the state [20] in subset with a 
nonzero transition probability from S'. We denote the resulting e-machine by M(k). Given this 
construction, the relation (|15|) can be rewritten as 

Ct= t "Pi* r ^ 

{k:L(M(k))<7L } 

where minimization is over all transition-consistent partitions of the state set of the full e-machine 
M*. 

Finding the exact minimum in (|16p appears to be a rather difficult problem since the number 
of transition-consistent partitions of the state set of M* is in general exponential in the number of 
casual states in M*. Therefore here we settle for some reasonable choices of "partial" information 
M instead. This way, clearly, we will generally obtain upper bounds on the exact values of C 7 . 
Fig. [7] shows several such "partial" e-machines that obtain naturally in the process of e-machine 
reconstruction from sample outputs of the system. Let us denote the "partial" e-machines with 
k + 1 nodes by The map g: M* — > for this case has the form g(lAjB) = lAjB for j < k 
and g(lAjB) = lAkB for j > k. The values of relative entropy and statistical complexity of these 
e-machines are shown in Table IIIIl The dependence of the loss on the information quasi-quantity 
can be found from the data of Table HTT1 by a simple rescaling (so that L(Mi) = 1). The results 
are shown in Fig. [HJ The quasi-information complexity C 7 for different values of the parameter 7 
can be read directly from the plot. Some of the resulting values are shown in Table HVl 

To consider a different example (called Example B in what follows), let us keep the topology 
of the exact e-machine for the example just discussed but change the transition probabilities so 
that the steady state probabilities do not decay as fast "down the line" (see Fig. [9]). Specifically, 
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TABLE III. Relative entropy rates and statistical complexities for sequential approximations to the exact 
e-machinc for Example A. 
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FIG. 7. Sequential approximations to the exact e-machinc of Example A. 
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FIG. 8. Loss vs. information quasi-quantity for Example A. 
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TABLE IV. Quasi-information complexity C 7 for some values of 7 for Example A. 
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FIG. 9. Exact e-machine for Example B. 

let the transition probabilities be Pr(z — > i + 1) = Pr(z — >■ 0) = ^2 for £ = 1, . . . JV — 1, 
Pr(iV —7- N) = jj-^ and Pr(0 — > 1) = 1. Just like in Example A, a transition from a state to 
another state with a numerically higher label or to itself is accompanied by an output of 1, and a 
transition from any state to state outputs 0. Note that the number of states for this e-machine has 
to be finite (but can be made arbitrarily large) in order for steady state probabilities to be positive. 
One can easily see that the steady state probabilities for this e-machine are -P(O) = 2(^(jv+2)+7 S -i/2) 
and P(i) = ^rj--P(O), where ^f(x) = jT^j ^ s the digamma function and je ~ 0.577 is the Euler 
constant. Let the "partial" e-machines representing incomplete information for this example have 
the same form as in the original example (see Fig. [T0|) . 

The probabilities shown in Fig. [10] have the following values: 

1 l + 2(tf(iV + 3)-^(3)) 



and 



2 1 + W(JV + 2) - (3) 
V(N + 3) - + 



^(N + 2) - + 

for k = 1, 2, . . . , N. The steady state probabilities for the "partial" e-machines can be computed 
in the same way as those for the exact one, and thus the information quasi-quantities equal to the 
corresponding Shannon entropies. On the other hand, the value of loss is found in a straightforward 
fashion using the general expression (I13p and taking into account that the map g: M* —> in 
this case has the form g(j) = j if j < k and g(j) = k if j > k, for the "partial" e-machine 
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FIG. 10. Sequential approximations to the exact e-machine of Example B. 




FIG. 11. Loss vs. information quasi-quantity for Example B. 

with k + 1 nodes. The resulting dependence of loss (in the convention Lq = L(Mo) = 1 and 
setting N = 1000) on the information quasi-quantity is shown in Fig. [TTJ The values of the quasi- 
information complexity C 7 for different values of 7 can be found directly for this dependence. Some 
of these values are shown in Table IVl 

Now, consider an e-machine shown in Fig. [T2l We will refer to this example as Example C. As 
can be seen from the figure, this e-machine has N states where N is a (large) even number, and the 
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TABLE V. Quasi-information complexity C 7 for some values of 7 for Example B. 
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transition probabilities have the form Pr(i — > i + 1) = ej, Pr(z —>•£) = 1 — e, for i = 1, . . . , JV — 1, 

Pr(A^ — )• 1) = ejvj Pr(Af — > iV) = 1 — e^. Any transition to an even numbered state (either from an 

odd numbered state or from itself) outputs 1, and any transition to an odd numbered state outputs 

0. It is easy to see that the steady state probabilities for this system have the form P(i) = j^P(l) 

for i = 2, . . . , N. If one sets = c where b > 1 is a real number, then a simple computation shows 

that P(i) = B^, i = 1, . . . , N, where B = [N + i(*(iV + 1 - \) - (1 - i)))" 1 . 

Let the incomplete information about the system be represented by the e-machines shown in 

Fig. [TU1 The first of these is obtained, just like in the original example, by simply estimating the 

overall probabilities of 1 and in the output assuming independence of output symbols (thus an 

absence of any "information processing" by the system). A simple calculation gives the following 

values for the probability p\ : 

= 1 N + l(^N + l-±)-*(l-±)) 
Pl 2 N + \ + 1 _ i) - *(i _ i)) ' 

and po = l — p\. The second e-machine in Fig. [13] is obtained by mapping all odd numbered states 

of the exact e-machine into the state O and all even numbered states - into the state E. The 

transition probabilities between states O and E are as shown in Fig. [TH1 where 

Po Nb+*(±N+i- b -£)-*(i- b -£y 

and Pi = 1 — Pq , Pq = 1 — pf, respectively. Any transition into state O is accompanied by an 
output of 0, and any transition into state E - by an output of 1. 

If one sets the parameter b equal to 3 and = 1000, the statistical complexity of the system 
calculates to be equal to 9.97. On the other hand, for the same values of parameters b and N, 
the loss corresponding to the two "partial" e-machines shown in Fig. [13] is L(Mo) = 0.98 and 
L(Mi) = 0.008, with their information quasi-quantities (Shannon entropies) being equal to and 
1.0, respectively. It follows that, C0.008 = 1-0 for this system (or, rather, that 1.0 is a upper bound 
on C0.008 as explained earlier). 

We can now make some interesting observations. 
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FIG. 12. Exact "circular" e-machine for Example C. 
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FIG. 13. First two approximations, Mo and Mi, to the "circular" e-machine of Example C. 



Comparing Examples A and B, we notice that the "full" quasi-information complexity Co 
(which is equal to statistical complexity) in Example B is significantly higher (7.8 vs 2.7) 
reflecting higher steady state probabilities of higher numbered states for Example B. 

The quasi-information complexity values for 7 > are also higher in the absolute sense 
but are about the same in the relative (measured as fractions of Co) sense for Example B, 
compared to Example A. The latter observation can likely be explained by the two e-machines 
sharing the same topology. 

If we now compare Examples A and C, we can see that the statistical complexity of the 
system in Example C is significantly higher than that in Example A (10.0 vs 2.7), again 
reflecting higher steady state probabilities of larger number of states in Example C. 

On the other hand, the quantity C0.01 is actually lower for the "circular" system in Example C 
(1.0 vs. 1.6). Informally speaking, this means that if a relative 1% difference in prediction 
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quality is immaterial to the agent, the system in Example C would appear less complex than 
that in Example A. 

• In this particular case, the latter phenomenon has a simple explanation. Indeed, if the agent 
predicts the next bit based on the hypothesis of i.i.d. distribution (which corresponds to the 
agent using the simplest e-machine Mo possessing no "structure") the resulting error is a lot 
larger than that afforded by the knowledge of the 'full" e-machine (complete information). 
In fact, the bits appear to be almost "unpredictable" (as the estimated probability of 1 is 
close to 0.5). The situation changes drastically if the agent is able to get access to Mi with 
two causal states, i.e. able to realize that in this system is followed by 1 and vice versa 
with high probability. The knowledge of Mi allows the agent to predict nearly as well as if 
he had access to the full e-machine M* with 1000 causal states. Put slightly differently, once 
the realization is made that and 1 symbols "mostly" alternate in this system, the rest of 
information just introduces small corrections to this dominant behavior. 

• Summarizing, one can say that to an agent not concerned (for whatever reason) with small 
(1% or less) "residual" losses, the system from Example C would appear as genuinely less 
complex than that in Example A, in spite of a much larger statistical complexity in Exam- 
ple C. 

Informally speaking, an agent observing the system output in Example C would quickly observe 
that 0's and l's mostly alternate but sometimes shorts strings of the same symbol are observed. This 
observation would let the agent conclude that the system can be reasonably accurately described 
by an e-machine of the form Mi presented earlier. It would take the agent, generally speaking, a 
lot more effort to discern finer details: namely that length of these short string of the same symbol 
tend to change just a little as time passes. The discovery of these fine details though would improve 
the agent's predictive ability only very slightly. One could say that, for this particular system, it 
is relatively easy to understand its behavior "almost fully" but a lot more difficult to fill in the 
remaining small details. The initially very steep C 7 curve that flattens out drastically reflects that. 

V. CONCLUSION 

The main contributions of this article can be briefly summarized as follows. 

• The notion of information-related complexity was extended from systems described by 
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stochastic processes to more general tasks (performed by a rational agent), including opti- 
mization and decision making problems with uncertainty about input data. 

A general problem oriented definition of complexity was proposed and it was shown that it 
reduces to a generalization of statistical complexity in the case the system output is described 
by a discrete-time stochastic process and the agent's task is that of prediction . 

The degree of task completion was explicitly considered and it was argued that the 
information-related complexity should be properly understood relative to the extent of 
task completion, or, equivalently, properly defined loss. In other words, the information- 
related complexity for the given task is inherently a real-valued function, and not just a real 
number. 



It was argued, based on results of 
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111 ], that the minimum information quantity neces- 
sary to complete the task to the given degree should be really interpreted as appropriate 
quasi-quantity that takes into account the degree of difficulty of obtaining the corresponding 
information in the given environment. The traditional information quantity measures (such 
as Shannon entropy) obtain from the appropriate quasi-quantity in the particular case of 
maximum symmetry, when either all different "pieces" of information are identical (with 
respect to the degree of difficulty of their acquisition by the agent) or their differences are 
unknown. 

Let's discuss some of these points in an informal way. A more complex system is the one that 
"takes a lot of work" to fully understand for a rational agent, and, moreover, a complex system 
still takes a lot of work to understand "almost fully" . On the contrary, a system that is "easy" to 
understand to a large degree but very hard to "get the rest" (a system with for which the C 7 curve 
is initially very steep but flattens out at close to zero loss) would appear to be of low complexity to 
all but "very discerning" agents. One might ask: what about the true "objective" complexity of a 
system that's independent (being objective) on all and any agents. We believe that this question 
might be more difficult and deeper than it appears. One possible answer is that it depends on the 
point of view and the level of inquiry. The same system can be complex from one point of view and 
much less so from another. Additionally, it can be complex at high level of detail but much less 
complex at "lower resolution" . For instance, in natural sciences, a system's ability to "innovate" , 
or spontaneously create new quality could be due to various levels of its structure detail. Hence, 
its appropriate complexity would also depend on that. Generally speaking, it is our opinion that 
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the science of complexity still has more questions than answers, and it would be of use, for the 
sake of clarity, to make some implicitly used or assumed aspects of the overall problem - explicit. 
This is one of the main goals of the present article. 
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